Molecular characterization of SARS-CoV-2 Omicron clade and clinical presentation in children

Since its emergence, SARS-CoV-2 Omicron clade has shown a marked degree of variability and different clinical presentation compared with previous clades. Here we demonstrate that at least four Omicron lineages circulated in children since December 2021, and studied until November 2022: BA.1 (33.6%), BA.2 (40.6%), BA.5 (23.7%) and BQ.1 (2.1%). At least 70% of infections concerned children under 1 year, most of them being infected with BA.2 lineages (n = 201, 75.6%). Looking at SARS-CoV-2 genetic variability, 69 SNPs were found to be significantly associated in pairs, (phi <  − 0.3 or > 0.3 and p-value < 0.001). 16 SNPs were involved in 4 distinct clusters (bootstrap > 0.75). One of these clusters (A23040G, A27259C, T23617G, T23620G) was also positively associated with moderate/severe COVID-19 presentation (AOR [95% CI] 2.49 [1.26–4.89] p-value: 0.008) together with comorbidities (AOR [95% CI] 2.67 [1.36–5.24] p-value: 0.004). Overall, these results highlight the extensive SARS-CoV-2 Omicron circulation in children, mostly aged < 1 year, and provide insights on viral diversification even considering low-abundant SNPs, finally suggesting the potential contribution of viral diversification in affecting disease severity.

Along the pandemic course, Severe Acute Respiratory Syndrome COronaVirus 2 (SARS-CoV-2) has evolved rapidly, accumulating single nucleotide polymorphisms (SNPs) and generating new variants characterized by different transmissibility, virulence, and immune evasion [1][2][3][4][5][6][7][8] .Some of these have been declared by WHO to be of particular concern due to their high transmissibility and impact on the general population.
Starting from the end of 2021, the Omicron clade has out-competed previous variants, rapidly becoming the dominant one.Since its emergence until the present, Omicron has undergone substantial genetic evolution, as evidenced by the identification of different lineages (BA.1, BA.2, BA.3, BA.4,BA.5) and their descendant and recombinant lines (XD, XE, XF, BQ etc.) [9][10][11][12] .These lineages share the same ancestor with the first, but at the same time are characterized by unique mutational patterns acquired over evolution.
In this regard, several studies have shown that the new amino acidic mutations at the spike protein characterizing Omicron clade increased transmissibility and evasion to different neutralizing antibodies (NAbs) [13][14][15][16] .Nonetheless, these mutations have not worsened clinical presentation, being frequently associated with less severe manifestations in the adult population [17][18][19] .In children, the dynamics of SARS-CoV-2 evolution are poorly studied, as is the potential clinical impact of unique mutational profiles.In our previous work, we demonstrated a sizeable circulation of different SARS-CoV-2 lineages in SARS-CoV-2 positive patients aged ≤ 12 years over the first four pandemic waves (from pandemic start to delta clade), but no significant associations were found between lineages and COVID-19 presentation, even if a lower number of moderate/severe cases were found during alpha-clade epidemic 20 .
It is well known that, in the evolutionary pathway, new advantageous mutations are selected and can become dominant.In line with this, restricting the study of the SARS-CoV-2 genome only to the level of consensus sequences may limit the complete knowledge and understanding of the evolutionary pathways of the virus, due From late December 2021 to early November 2022, 2831 new SARS-CoV-2 diagnoses were performed in pediatric subjects (< 12-year-old) at the Bambino Gesù Children Hospital IRCCS in Rome.For 1182 diagnoses, nasopharyngeal swabs characterized by cycle threshold (Ct) < 29 or Antigenic Cut off index (COI) > 1000 and demographic and clinical information were available and retrieved.Whole SARS-CoV-2 genome was performed for 713 randomly selected samples and was successfully obtained for 657 final samples.Demographic and clinical characteristics are reported in Table 1.

Distribution of SARS-CoV-2 lineages affecting paediatric population
The distribution of SARS-CoV-2 sequences against clinical characteristics and against the global context of SARS-CoV-2 (Pangolin https:// pango lin.cog-uk.io/) 22 is shown by the Maximum likelihood tree in Supplementary Fig. S1 and by the time-scale phylogeny in Fig. 1.Demographic and clinical characteristics of patients infected with SARS-CoV-2 against lineages are reported in Table 1.
Supplementary Table S1 reports the 102 SNPs characterized by a different prevalence across Omicron lineages (Supplementary Table S1).

Covariation profiles among SARS-CoV-2 SNPs
Statistically significant pairs of SNPs Looking at potential associations among SNPs, we found that 69 SNPs were involved in significant associations (Supplementary Table S2).Forty-six resided in non-structural proteins and the remaining 23 in structural ones.Seventeen SNPs were localized in spike positions, 13 in nsp3, and 10 in RNA-dependent RNA polymerase.

Clusters of correlated SNPs
By hierarchical clustering analysis, it was possible to identify 4 distinct clusters (bootstrap > 0.75) of SNPs, positively correlated among them (Fig. 3, Table 2).The first cluster (bootstrap = 0.99) includes 4 SNPs located in the spike protein (G22578A and G23642T), the membrane protein (C26577G) and RdRp (T15474G).Two of these SNPs (G23642T and T15474G) were present at low frequency and low abundance in BA.1, BA.2 and BA.5, while were detected in 100% of BQ.1.C26577G increased its prevalence and frequency among lineages, becoming high-abundant SNP in BA.5 and BQ.1.G22578 in the spike protein was known to be a highly abundant SNP detected in more than 90% of BA.5 and BQ.1 (Supplementary Table S1).
The second cluster (bootstrap = 0.99) involved two low-abundant SNPs (the non-synonymous C15157A and the synonymous G15168A) detected exclusively in the BA.1 and BA.2 lineages and both located in the RdRp region (Supplementary Table S1).
The third cluster (bootstrap = 0.96) is characterized by four SNPs located mainly in the spike and ORF6 gene.Three synonymous SNPs (A27259C, A23040G and T23617G) were exclusively detected in BA.1 and BA.2.The non-synonymous T23620G (never exceeding an abundance of 20%) in the spike protein decreased its prevalence from BA.1 to BA.5, while was detected in the 100% of BQ.1 (Supplementary Table S1).

Discussion
The Omicron wave resulted in an increased number of children infected by SARS-CoV-2 even though displaying a reduced incidence of severe manifestations 23,24 compared to Delta and pre-Delta clades.An in-depth genomic characterization of Omicron variability, taking into consideration both high and low abundance mutations, may address new perspectives on infection and pathogenesis modifications, also in the setting of long-term manifestations and Multisystem inflammatory syndrome (MIS-C), the latter known to be mostly related to young age [25][26][27] .
Here, our study confirmed the large circulation of SARS-CoV-2 Omicron in the paediatric population, with age less than 1 year appearing to be the most susceptible category to infection regardless of lineage.As already extensively reported, the accumulation of these mutations defines new lineages known to have increased infectivity, transmission, and immune escape 11,12,28 .
Our study also provided evidence of a decrease in low-abundant SNPs from BA.1 to BA.5 and BQ.1 lineages.Similar results were observed by intra-host genetic analysis, which revealed a lower number of mutations in BA.2.3 and BA.5 compared to BA.1 and BA.2 in the spike protein 30 .These results might hypothesize the role  www.nature.com/scientificreports/ of these mutations in immune responses, in line with evidence on the enhanced ability of the omicron clade, particularly the later lineages, to evade the immune system even in vaccinated subjects 31 .Moreover, a number of these SNPs were involved in mutational clusters together with high abundant and constitutive SNPs.The first complex mutational cluster identified involved the low abundant SNPs RdRp-G678G [nucleotide: T15474G], spike-A694S [nucleotide: G23642T].These SNPs showed increased prevalence across Omicron lineages.Specifically, their prevalence increased from 10% in the first Omicron lineages to 100% in the BQ.1 lineage, without showing a substantial difference in their reads' frequency.These SNPs clustered together with the highly abundant SNPs spike-G339D [nucleotide: G22578A] and Membrane-Q19E [nucleotide: C26577G].The first one, falling in the spike protein RBD domain, can increase the molecular flexibility of the glycoprotein 32 .The second, which resides in the N-terminal domain of the membrane protein, appears to destabilize the structure of the protein itself 33 .
The second cluster involved two low abundant SNPs localized in the RdRp (L576L [nucleotide: G15168A] and Q573K [nucleotide: C15157A]) and exclusively found in BA.1 and BA.2 lineages.Both SNPs reside in the finger domain of RdRp 34 .The close contact of Q573K with the residues involved in the active site and substrate/ template binding tunnel as well as with the residues directly involved in RdRp-inhibitors binding might suggest its involvement in replication capacity and drug interaction 35 .
The third cluster, found mainly in BA.1 and BA.2 lineages, was composed of the spike mutations S686R [nucleotide: T23620G], Q493R [nucleotide: A23040G], and R685R [nucleotide: T23617G] together with the ORF6-R20R [nucleotide: A27259C].The low abundant S686R mutation was detected mainly in BA.1 and BA.2 sequences, and in a minority of BA.5 (10.3% of total BA.5 with S686R), maintaining a reads frequency always below 20%.This mutation is localized close to the furin cleavage site, implicated in the replication and pathogenesis of SARS-CoV-2.A previous study showed that the amino acid change from polar, not charged (serine) to non-polar (glycine) at this position interfered in furin-type cleavages 36 .In our case, the new amino acid (Arginine) is positively charged, and therefore further insights are needed to define the role of this mutation (even if present at low reads frequency) in affecting or improving the recognition of furin cleavage site and in increasing SARS-CoV-2 infectivity.Q493R, located in the receptor binding domain of the spike protein (S1-RBD), increases binding affinity to angiotensin-converting enzyme 2 and reduces susceptibility to class 3 monoclonal antibodies and to bamlanivimab 37,38 .Spike-R685R [nucleotide: T23617G] is localized into furin cleavage site, implicated in replication and pathogenesis of SARS-CoV-2 39 .Finally, ORF6 exhibits critical antagonistic activity, preventing the antiviral innate immune response by inhibiting interferon β (IFN-β) production and blocking the expression of STAT1-activated genes for SARS-CoV-2 40 .
Multivariate logistic regression has further sustained these clusters suggesting their predictive role in disease manifestation.Indeed, the multivariate logistic regression model identified cluster 3, mainly found in BA.1 and BA.2 lineages and composed of three spike mutations, as positively associated with the worst clinical manifestation, together with comorbidities.Differently, BA.5 lineage showed a trend of negative association with moderate/severe manifestations.Thus, our multivariate analysis suggests that the ongoing evolution of the Omicron clade is associated with different trend of severe manifestations in children.This is confirmed also when major In line with this, a recent study indicates that Omicron lineages dominating between January and June 2022 caused a less severe disease 45 .
Factors that can contribute to this evidence include increased immunization (both natural and artificial) at the population level and functional properties of the new clade that might impact the pathogenesis of SARS-CoV-2 for humans [46][47][48] .
In the paediatric setting, these results could also explain the low risk of MIS-C observed after infection by Omicron clade 49 .
Our study has some limitations.Limitations to assessments of the proportions of asymptomatic cases should be noted: most of our paediatric population is tested only when children have symptoms, so relatively few asymptomatic infections are recorded.No information is available regarding the vaccination status.Another limitation is the limited presence of BQ.1 and the absence of the recombined Omicron forms (for example XBB or XBB.1.5),at the time of the study still absent.Since the end of the study (November 2022) to the time of writing (April, 2023) there have been 108 SARS-CoV-2 new diagnoses meeting the criteria defined in "Methods" section (i.e.Ct < 29 or COI > 1000).Considering the estimated prevalence of BQ.1 and XBB forms in these last months 50 , samples belonging to these Omicron forms would be about 60-70.
In conclusion, these findings underscored the widespread circulation of SARS-CoV-2 Omicron variant among children, particularly those under the age of one, even though no notable difference could be identified in their clinical outcomes compared to older age groups.Additionally, the study shed light on low-abundant mutations and their impact on evolutionary processes, suggesting a potential role for viral diversification in influencing disease severity.

Sample collection, and epidemiological data
This retrospective observational study, intended as follow-up of a previous work (Alteri et al., Scientific Report 2022 20 ), included 657 SARS-CoV-2-positive nasopharyngeal-swabs, collected from 657 patients aged ≤ 12 years referred for SARS-CoV-2 diagnosis at Bambino Gesù Children Hospital from December 2021 to early November 2022.
Demographics, epidemiological and clinical data were obtained retrospectively by pseudonymized electronic medical records.
As the previous paper 20 , the study protocol was approved by local Research Ethics Committee of Ospedale Pediatrico Bambino Gesù IRCCS (prot.2384_OPBG_2021), and was conducted under the principles of the 1964 Declaration of Helsinki.Informed consent was waived by the Ethics Committee of Ospedale Pediatrico Bambino Gesù IRCCS following the hospital regulations on observational retrospective studies.
The severity of SARS-CoV-2 infection was defined according to Dong et al., Pediatrics.2020 51 and based on the clinical features, laboratory testing, and chest radiograph imaging.Asymptomatic, mild and moderate/severe infections were defined according to Alteri et al., Scientific Report 2022 20 .

Virus amplification and sequencing
Viral RNAs were extracted from nasopharyngeal swabs by using QIAamp Viral RNA Mini Kit, followed by purification with Agencourt RNAClean XP beads.Both the concentration and the quality of all isolated RNA samples were measured and checked with the Nanodrop.
SNP variants were called with freebayes (v1.3.2) 54 and all SNPs having a minimum supporting read frequency of 2% with a depth ≥ 10 were retained.
Synonymous and non-synonymous SNPs characterizing Omicron lineages were defined as high-abundant mutations if characterized by a read frequency ≥ 40%, and low-abundant mutations if characterized by a read frequency between 2 and 40%.

Phylogenetic analysis and estimation of evolutionary rate
Consensus sequences were generated using the GitHub freely distributed software vcf_consensus_builder 55 considering all SNPs having a minimum read frequency of 40% (high-abundant mutations).SARS-CoV-2 lineages of the obtained consensus sequences were assigned according to Pangolin application (Pangolin v4.1.1,https:// github.com/ cov-linea ges/ pango lin) and then grouped in four major lineages (BA.1, BA.2, BA.5, BQ.1).Sequences were aligned using MAFFT v7.475 and manually inspected using Bioedit.The final alignment comprised 657 sequences of 29,801 nucleotides of length.In order to explore the phylogenetic structure of the paediatric epidemic and evolutionary rate of Omicron clade affecting population aged ≤ 12, a maximum likelihood (ML) phylogeny tree was performed with IqTree2 (v2.1.3) 56with 1000 bootstrap replicates, using the best-fit model of nucleotide substitution GTR + F + R3 inferred by ModelFinder 57 .The ML tree was inspected in TempEst 58 , in order to define the correlation between genetic diversity (root-to-tip divergence) and time of sample collection.www.nature.com/scientificreports/Bayesian coalescent methods were further performed, in order to define the phylogenetic structure of the paediatric epidemic against time.A Bayesian coalescent tree analysis was undertaken with BEAST (v1.10.4) 59 , using the GTR + I + G4 substitution model, inferred by Modeltest-NG (v0.1.7) 60,61, with an exponential population growth tree prior and uncorrelated relaxed molecular clock, under a noninformative continuous-time Markov chain (CTMC) reference prior using only paediatric sequences.Three independent chains were run for 50 million states and parameters and trees were sampled every 1000 states.Upon completion, chains were combined using LogCombiner after removing 10% of states as burn-in and convergence was assessed with Tracer (ESS > 100) 62 .Taxon sets were defined based on SARS-CoV-2 lineages and were used to estimate the time of their most recent common ancestor (tMRCA), as well as the rates of evolutionary change (expressed as nucleotide substitutions per site per year).
Annotation of the phylogenetic tree, including information about lineages, SARS-CoV-2 viral load, symptoms and hospitalization was performed with iTOL (v5) 63 .

Covariation analysis
The binomial-correlation coefficient (phi) was calculated for each pair of SNPs to assess the strength of co-variation among SNPs.Covariation analysis was conducted including all SARS-CoV-2 SNPs with a prevalence > 5% in the overall population.
The phi coefficient and p-value for all the possible pairwise combinations were calculated by using a script implemented in the R software, version 3.4.1.Statistically significant pairwise associations were considered those with p-value < 0.001 and phi > 0.3 and < − 0.3.
To analyze the covariation structure of the SNPs in more detail, average linkage hierarchical agglomerative clustering, reported as a dendrogram, was performed.The statistical robustness of the dendrogram was confirmed with a bootstrap analysis using 10.000 replications.Clusters with a bootstrap value equal or higher than 0.70 were considered well-supported.

Statistical analysis
Descriptive statistics were expressed as median values and interquartile range (IQR) for continuous data and number (percentage) for categorical data.To assess significant differences, chi-squared test for trend and Kruskal-Wallis were used for categorical and continuous variables, respectively.
A multivariate logistic regression analysis was performed to evaluate demographic and virus-related associated with disease severity.
Statistical analyses were performed with SPSS software package for Windows (version 23.0, SPSS Inc., Chicago, IL).A two-sided p-value < 0.05 was considered statistically significant.

Figure 1 .
Figure 1.Bayesian phylogenetic reconstruction incorporating date of diagnosis of the 657 SARS-CoV-2 sequences of 29,801 nucleotides of length obtained by population aged ≤ 12 years.SARS-CoV-2 genomes were highlighted in different colors against omicron lineages.Information regarding hospitalization and symptoms were also reported.Three independent chains were run for 50 million states, using the best-fit model of nucleotide substitution GTR + I + G4 with an uncorrelated relaxed molecular clock under a noninformative continuous-time Markov chain (CTMC) reference prior using only paediatric sequences.Parameters and trees were sampled every 1000 states.

Figure 2 .
Figure 2. Median number and Interquartile range of high-abundant (A) and low-abundant (B) SNPs observed against Omicron lineages.p-values were calculated by the Kruskal-Wallis test.SNPs single nucleotide polymorphism.

Figure 3 .
Figure 3. Dendrogram of correlated mutations.The dendrogram, obtained from average linkage hierarchical agglomerative clustering, shows clusters of mutations localized in different SARS-CoV-2 proteins.The length of branches reflects distances between mutations in the original distance matrix.Bootstrap values, indicating the significance of clusters, are reported in the boxes. https://doi.org/10.1038/s41598-024-55599-0 d Real-time reverse transcription PCR Ct (cycle threshold) values were obtained by AllplexTM 2019-nCoV Assay Seegene (target E, RdRp, N), and Xpert Xpress SARS-CoV-2 Assay, Cepheid (Target E and N2). e SARS-CoV-2 Ag COI (Cut Off Index) values were obtained by Roche SARS-CoV-2 Rapid Antigen Test and SD Biosensor COVID-19 Ag FIA.

Table 2 .
Clusters of positively correlated SNPs.SNP single nucleotide polymorphism.a Covariation frequency based on the prevalence of SNP 1. b Covariation frequency based on the prevalence of SNP 2. c Positive correlations with phi > 0.3 d All p-values for covariation were significant at a false discovery rate of 0.05.